-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: correctly escape quotes in field names #596
Conversation
This is a preview of the changelog of the next release. If this branch is not up-to-date with the current main branch, the changelog may not be accurate. Rebase your branch on the main branch to get the most accurate changelog. Note that this might contain changes that are on main, but not yet released. Changelog: 0.2.22 (2024-09-30)Bug Fixes
|
db662df
to
55e262a
Compare
55e262a
to
9d37222
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting an error:
[2024-09-26 13:57:41.395] [logger] [debug] [preprocessing_database.cpp:73] Preprocessing Database - Result:
Parser Error: syntax error at or near ""lineage IS NULL THEN 'NULL'::VARCHAR ELSE '_' || pango_""
LINE 4: FROM (SELECT CASE WHEN pango_"lineage IS NULL THEN 'NULL'::VARCHAR ELSE '_' || pango_"lineage END AS partition_key, COUNT(*) AS count
FROM metadata_table
GROUP BY partition_key
ORDER BY partition_key);
...
^
[2024-09-26 13:57:41.400] [logger] [error] [api.cpp:282] Parser Error: syntax error at or near ""lineage IS NULL THEN 'NULL'::VARCHAR ELSE '_' || pango_""
LINE 4: FROM (SELECT CASE WHEN pango_"lineage IS NULL THEN 'NULL'::VARCHAR ELSE '_' || pango_"lineage END AS partition_key, COUNT(*) AS count
FROM metadata_table
GROUP BY partition_key
ORDER BY partition_key);
...
^
With testBaseData/exampleDataset/database_config.yaml
changed to
schema:
instanceName: sars_cov-2_minimal_test_config
metadata:
- name: gisaid_epi_isl
type: string
- name: date
type: date
- name: unsorted_date
type: date
- name: region
type: string
generateIndex: true
- name: country
type: string
generateIndex: true
- name: 'pango_"lineage'
type: pango_lineage
- name: division
type: string
generateIndex: true
- name: age
type: int
- name: qc_value
type: float
- name: test_boolean_column
type: boolean
primaryKey: gisaid_epi_isl
dateToSortBy: date
partitionBy: 'pango_"lineage'
defaultNucleotideSequence: "main"
and the respective small_metadata_set.tsv
header
gisaid_epi_isl pango_"lineage date region country division unsorted_date age qc_value test_boolean_column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything wrong with your changes--assuming that explicit escaping is the proper pragmatic decision to be taken right now. The safer solution is to never create SQL strings but use prepared statements and let the database handle correct value inclusion. Apparently DuckDB supports them[1]. This will be a somewhat larger change, depending on whether you want to keep the prepared statements around (may or may not be relevant for performance). Considering our tight time budget right now I guess it's fair to postpone that change.
[1] https://duckdb.org/docs/sql/query_syntax/prepared_statements.html
I'm approving in case you want to defer the prepared statements, also I haven't understood the test name arguments and am not implying to override those.
87efd6a
to
0bb630d
Compare
0bb630d
to
cbb8145
Compare
resolves #595
Summary
We did not correctly escape quotes in all cases. The detected error is added as a test and the corresponding bug is now fixed.
PR Checklist
- [ ] All necessary documentation has been adapted or there is an issue to do so.